Search CORE

94 research outputs found

Bridging the Semantic Gap with SQL Query Logs in Natural Language Interfaces to Databases

Author: Baik Christopher
Jagadish H. V.
Li Yunyao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/01/2019
Field of study

A critical challenge in constructing a natural language interface to database (NLIDB) is bridging the semantic gap between a natural language query (NLQ) and the underlying data. Two specific ways this challenge exhibits itself is through keyword mapping and join path inference. Keyword mapping is the task of mapping individual keywords in the original NLQ to database elements (such as relations, attributes or values). It is challenging due to the ambiguity in mapping the user's mental model and diction to the schema definition and contents of the underlying database. Join path inference is the process of selecting the relations and join conditions in the FROM clause of the final SQL query, and is difficult because NLIDB users lack the knowledge of the database schema or SQL and therefore cannot explicitly specify the intermediate tables and joins needed to construct a final SQL query. In this paper, we propose leveraging information from the SQL query log of a database to enhance the performance of existing NLIDBs with respect to these challenges. We present a system Templar that can be used to augment existing NLIDBs. Our extensive experimental evaluation demonstrates the effectiveness of our approach, leading up to 138% improvement in top-1 accuracy in existing NLIDBs by leveraging SQL query log information.Comment: Accepted to IEEE International Conference on Data Engineering (ICDE) 201

arXiv.org e-Print Archive

Crossref

DEEP CONVECTIVE TRANSPORT AND WET SCAVENGING IN DIFFERENT CONVECTIVE REGIMES DURING THE DC3 FIELD CAMPAIGN

Author: Li Yunyao
Publication venue
Publication date: 01/01/2018
Field of study

Deep convective transport of surface moisture and pollution from the planetary boundary layer to the upper troposphere and lower stratosphere affects the radiation budget and climate. Firstly, I analyzed the deep convective transport through cloud-resolved simulations of three different convective regimes from the 2012 Deep Convective Clouds and Chemistry (DC3) field campaign: an airmass thunderstorm, a supercell storm, and a mesoscale convective system (MCS). Analysis of vertical flux divergence shows that deep convective transport in the supercell case is the strongest per unit area, while transport of boundary layer insoluble trace gases is relatively weak in the MCS due to the injection of clean air into the mid-troposphere by a strong rear inflow jet. Additionally, forward and backward trajectories are used to determine the source of the upper-level detrained air. My second focus is using of cloud parameterized Weather Research and Forecasting model coupled with chemistry (WRF-Chem) simulations to analyze the subgrid deep convective transport in the supercell case and MCS case. Based on the precipitation results, the best WRF simulation of these storms was obtained with use of the Grell-Freitas (GF) convective scheme. The default subgrid convective transport scheme was replaced with a scheme to compute convective transport within the GF subgrid cumulus parameterization, which resulted in improved transport simulations. The results demonstrate the importance of having subgrid convective transport consistent with the convective parameterization in regional models. Moreover, the subgrid scale convective transport played a more significant role in the supercell case than the MCS case. I evaluated the model-simulated subgrid wet scavenging of soluble trace gases (such as HNO3, CH2O, CH3OOH, H2O2, and SO2) in the supercell case, and improved subgrid wet scavenging by determining appropriate ice retention factors, and by adjusting the conversion rate of cloud water to rain water. The introduction of the ice retention factors greatly improved the model simulation of less soluble species (e.g. decreased the CH2O simulation error by 12 % and decreased the CH3OOH simulation error by 63%). Finally, I conducted a > 24-hour long simulation to examine downwind ozone production and its sensitivity to the ice retention factors

Digital Repository at the University of Maryland

Small but Mighty: New Benchmarks for Split and Rephrase

Author: Brahma Siddhartha
Li Yunyao
Zhang Li
Zhu Huaiyu
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.Comment: In EMNLP 202

arXiv.org e-Print Archive

Crossref

I $^2$ MD: 3D Action Representation Learning with Inter- and Intra-modal Mutual Distillation

Author: Deng Jiajun
Li Houqiang
Lu Zhenbo
Mao Yunyao
Ouyang Wanli
Zhou Wengang
Publication venue
Publication date: 24/10/2023
Field of study

Recent progresses on self-supervised 3D human action representation learning are largely attributed to contrastive learning. However, in conventional contrastive frameworks, the rich complementarity between different skeleton modalities remains under-explored. Moreover, optimized with distinguishing self-augmented samples, models struggle with numerous similar positive instances in the case of limited action categories. In this work, we tackle the aforementioned problems by introducing a general Inter- and Intra-modal Mutual Distillation (I

^2

MD) framework. In I

^2

MD, we first re-formulate the cross-modal interaction as a Cross-modal Mutual Distillation (CMD) process. Different from existing distillation solutions that transfer the knowledge of a pre-trained and fixed teacher to the student, in CMD, the knowledge is continuously updated and bidirectionally distilled between modalities during pre-training. To alleviate the interference of similar samples and exploit their underlying contexts, we further design the Intra-modal Mutual Distillation (IMD) strategy, In IMD, the Dynamic Neighbors Aggregation (DNA) mechanism is first introduced, where an additional cluster-level discrimination branch is instantiated in each modality. It adaptively aggregates highly-correlated neighboring features, forming local cluster-level contrasting. Mutual distillation is then performed between the two branches for cross-level knowledge exchange. Extensive experiments on three datasets show that our approach sets a series of new records.Comment: submitted to IJCV. arXiv admin note: substantial text overlap with arXiv:2208.1244

arXiv.org e-Print Archive

Masked Motion Predictors are Strong 3D Action Representation Learners

Author: Deng Jiajun
Fang Yao
Li Houqiang
Mao Yunyao
Ouyang Wanli
Zhou Wengang
Publication venue
Publication date: 14/08/2023
Field of study

In 3D human action recognition, limited supervised data makes it challenging to fully tap into the modeling potential of powerful networks such as transformers. As a result, researchers have been actively investigating effective self-supervised pre-training strategies. In this work, we show that instead of following the prevalent pretext task to perform masked self-component reconstruction in human joints, explicit contextual motion modeling is key to the success of learning effective feature representation for 3D action recognition. Formally, we propose the Masked Motion Prediction (MAMP) framework. To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints. Considering the high temporal redundancy of the skeleton sequence, in our MAMP, the motion information also acts as an empirical semantic richness prior that guide the masking process, promoting better attention to semantically rich temporal regions. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets show that the proposed MAMP pre-training substantially improves the performance of the adopted vanilla transformer, achieving state-of-the-art results without bells and whistles. The source code of our MAMP is available at https://github.com/maoyunyao/MAMP.Comment: To appear in ICCV 202

arXiv.org e-Print Archive

Pronounced Increases in Nitrogen Emissions and Deposition Due to the Historic 2020 Wildfires in the Western U.S.

Author: Campbell Patrick C.
Kondragunta Shoba
Li Fangjun
LIi Yunyao
Ma Siqi
Saylor Rick
Tong Daniel
Zhang Xiaoyang
Publication venue: Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange
Publication date: 01/01/2022
Field of study

Wildfire outbreaks can lead to extreme biomass burning (BB) emissions of both oxidized (e.g., nitrogen oxides; NOx= NO+NO2) and reduced form(e.g., ammonia; NH3) nitrogen (N) compounds. High N emissions aremajor concerns for air quality, atmospheric deposition, and consequential human and ecosystemhealth impacts. In this study, we use both satellite-based observations and modeling results to quantify the contribution of BB to the total emissions, and approximate the impact on total N deposition in the western U.S. Our results show that during the 2020 wildfire season of August–October, BB contributes significantly to the total emissions, with a satellite-derived fraction of NH3 to the total reactiveN emissions (median~40%) in the range of aircraft observations. During the peak of the western August Complex Fires in September, BB contributed to~55%(for the contiguous U.S.) and~83%(for thewestern U.S.) of the monthly total NOx and NH3 emissions. Overall, there is good model performance of the George Mason University- Wildfire Forecasting System(GMU-WFS) used in this work. The extreme BB emissions lead to significant contributions to the total N deposition for different ecosystems in California, with an average August – October 2020 relative increase of~78%(from7.1 to 12.6 kg ha−1 year−1) in deposition rate tomajor vegetation types (mixed forests+grasslands/ shrublands/savanna) compared to the GMU-WFS simulations without BB emissions. For mixed forest types only, the average N deposition rate increases (from 6.2 to 16.9 kg ha−1 year−1) are even larger at ~173%. Such large N deposition due to extreme BB emissions are much (~6-12 times) larger than low-end critical load thresholds for major vegetation types (e.g., forests at 1.5-3 kg ha−1 year−1), and thus may result in adverse N deposition effects across larger areas of lichen communities found in California\u27s mixed conifer forests

Public Research Access Institutional Repository and Information Exchange

FLEEK: Factual Error Detection and Correction with Evidence Retrieved from External Knowledge

Author: Bayat Farima Fatahi
Belyi Anton
Han Benjamin
Ilyas Ihab F.
Khorshidi Samira
Li Yunyao
Qian Kun
Sang Yisi
Wu Fei
Publication venue
Publication date: 25/10/2023
Field of study

Detecting factual errors in textual information, whether generated by large language models (LLM) or curated by humans, is crucial for making informed decisions. LLMs' inability to attribute their claims to external knowledge and their tendency to hallucinate makes it difficult to rely on their responses. Humans, too, are prone to factual errors in their writing. Since manual detection and correction of factual errors is labor-intensive, developing an automatic approach can greatly reduce human effort. We present FLEEK, a prototype tool that automatically extracts factual claims from text, gathers evidence from external knowledge sources, evaluates the factuality of each claim, and suggests revisions for identified errors using the collected evidence. Initial empirical evaluation on fact error detection (77-85\% F1) shows the potential of FLEEK. A video demo of FLEEK can be found at https://youtu.be/NapJFUlkPdQ.Comment: EMNLP 2023 (Demonstration Track

arXiv.org e-Print Archive